home

Editorial
Today's News
News Archives
On-line Articles
Current Issue
Current Abstracts
Magazine Archives
Subscribe to ISD


Directories:
Vendor Guide 2000
Advertiser Index
EDA Web Directory
Literature Guide
Event Calendar


Resources:
Resources and Seminars
Special Sections
High-tech Job Search


Information:
2001 Media Kit
About isdmag.com
Writers Wanted!
Search isdmag.com
Contact Us







Implementing the OLA Format

Delay sits at the center of the DSM design closure problem. Close examination of the problem strongly suggests adopting OLA as a format that's both accurate and functional. By Keith Peshak


Is deep-submicron (DSM) closure a real problem? Perhaps so, if only because designers of today's complex chips are using tools based on "minimum-reliable" delay estimation models. These models have two purposes-to make the design simulation run fast and to make the delay information inside the EDA software tool, which represents each hardware connection, compact and manageable.

The hardware-design schedule length and the number of hardware-design cycle iterations (measuring the efficiency of the design process) result from the representation of reality when measured against reality. It does sound confusing, but also makes sense if you think about it. The end result of measuring reality, both actual and modeled, is a need for excessively conservative design timing and for sign-off at each output of each design automation tool. Our only solutions to the DSM closure problem are guard banding (making the design to tighter constraints than are necessary for yield) at every stage of the design process and searching for the best set of EDA tools.

So how do you calculate delay in a way that provides the truth to the IC designer whose only concern is whether the IC is going to work accurately? Consistency between heavily guard-banded tools isn't the answer, particularly when it gives the designer the wrong answer.

The concurrent question is how to build an envelope, where an EDA tool manufacturer can use the same tool architecture independent of the delay calculation formula for the various lambda-rule technologies.

OLA (Open Library Application Programming Inteface) provides the enabling method to reach that end. It is a standard, it provides accurate delay information and an accurate library format for EDA tools, and it works.

In further detail, OLA stands for:

  • Open - Every tool can benefit from a single library, built to a standard.

  • Library - We still use the synthesizer-chosen driving cell, and the simulation tool will place an access call to obtain delay.

  • API - Application Program Interface

all within which we wrap the delay and the power and signal integrity algorithms. We calculate the real delay, based on important parameters as they become known. This calculation approaches analog simulator results and allows the addition of signal integrity algorithms. The algorithm that the EDA tool uses from the Open Library API is the choice of the library vendor.

The delay problem

The EDA tool industry took a piece-wise linear approach to the problem of calculating digital delay. They moved to a new model at 1 micron, which seemed to last to about 0.5 micron. With smaller geometries, however, new algorithms are required. We are stuck at about 0.18 micron right now. To consider signal integrity issues, additional algorithms are also needed. This aspect of the problem is growing exponentially as we move further into the deep submicron. Clearly, we can't stay in business by following the previous model. Efficiency is important, but the EDA industry's minimum-calculation time strategy of estimating gate delay by table look-up from a compiled library model needs improvement.

Figure 1 - OLA replaces SDF
The elimination of SDF files and the tying together of all the tools are reflected here.

The basic EDA design tool architecture is to split the electronic representation between the design (represented in a design information format) and the target fabrication technology's cell library (produced for a synthesis tool and simulation tool, and represented in a library information format such as the Synopsys .lib format). Boolean logic canonical forms, registers, bus connections, and clock trees make up the design information format. The technology information of the TTL data book fab cell offerings constitutes the library information format. This contains two major parts: the gates (two and four input NAND, not three or five) and the registers (D with synchronous reset) that are offered by the silicon provider, as well as delay and drive information.

The silicon industry once faced the necessity of providing the library information database in a different form for use by each EDA tool. Everyone had their own proprietary format needs for this information and everyone had their own opinion of what constituted a technology parameter. Standardization on the Synopsys .lib format is helping to solve that problem. However, the appropriateness of the .lib format for use in different delay calculators (the analysis engine) in each tool leads to inconsistencies at the post-layout level when determining Ceffective (the effective capacitance of a load on a driving gate) and interconnect delay and slew.

Today the EDA tool industry must chose which calculation algorithm to use for delay. Historically, we used the concept that gates have propagation delay and wires do not. While this is clearly incorrect, it worked well enough, but wires and connected gate inputs add capacitance to gate resistance. For a few years, the problem was solved by adding to the gate delay look-up model a second component for incremental wire delay. The design information format told the simulator about the connection; the tool performed the library information format look-up of the transmitting cell delay. We asked every tool vendor to enforce consistency through the use of the SDF file to add things like wire delay.

Surprisingly, we use the SDF file to enforce consistency between tools, but we aren't enforcing accuracy!

Today, again, that concept begins to fail miserably as we move toward the "physics of the small." Our problem continues to be that the basic nature of the components of a logic delay-from a gate, down a wire, to another gate-changes as the minimum-resolvable dimension (lambda) diminishes. Let's look at an example.

Improvement to gate delay calculation and signal integrity issues has been accomplished by a reversion to basic electrical engineering equations. Gates are composed of transistors, which have resistance and capacitance.

Wires are composed of resistance and capacitance and connect to loads, which are composed of capacitance. The Spice model better defines the reality:

  • T = RC Ln((Vss-Vi)/(Vss/Vf))

  • R = transmitting gate and wire resistance

  • C = gate and load and wire capacitance

  • Vss = power supply rail headed for (steady state)

  • Vi = load gate voltage when this all started (initial)

  • Vf = load gate logic transition threshold voltage (final) 0.2 Vdd for N, - 0.8 Vdd for P channel enhancement CMOS

This is "truth," but there is another "truth" as well. Tightly packed wires have mutual (capacitive and inductive) couplings, which make the apparent Vf a function of time (what is going on over the adjacent wire). Today, this is a separately treated signal-integrity issue. Equally true: long wires at a high data rate have characteristic impedance. Mismatch the impedence at the source or load end, and it may take many passes of the electromagnetic wave back and forth across the wire to affect a logic transition at the receiving gate. Not T, rather x*T, at a 1 GHz clock, where in place of x, it would be nice to know which odd integer.

There is yet another future signal-integrity issue (stair-stepped rising/falling edge). As the small becomes smaller, there are other truths that will become more significant than in the past. Treating the situation as a separate issue may not be the correct path to take. The .lib format doesn't accommodate this new truth.

OLA tackles DSM closure

The challenge is to approach the truth of an analog simulation with the digital EDA simulation tool architecture. This must be done in such a way as to make use of a compiled library model of technology target cells, to eliminate the run time of the analog simulation tool, and to address the additional signal integrity issues.

Additionally, how do you deal with the fact that the SDF file holds all tools captive in order to enforce consistency? OLA can position accurate library information for all EDA tools, without the need for SDF files (see Figure 1). Thus, it enables each tool to get the right answer for delay and signal integrity. The critical basic question for the future: When an EDA tool company implements OLA, they get a tool that tells the designer what the delay is, plus/minus three percent. The designer may then discover that he has a six percent problem in the design. Therefore, is it a good idea for the designer to rethink his design because his RTL architecture is bad? The answer is "Yes"-and the earlier into the design process (first design cycle), the better. Even better, have the tool revisit synthesis and fix the problem nets without bothering the engineer.

Although IC design tends to be linearly scheduled, design of the integrated circuit isn't. The iteration process can make a designer think, "If I had known that, I would have done this whole design differently." These iterations occur at post placement and again at post-detailed route. They can take the designer all the way back to the RTL specification. Today, nothing can be done about that. Since the more iteration, the better the final design result, a technology that efficiently allows or even encourages iterations to delay minima could bring about significant benefits (see Figure 2).

OLA is right for this job, encouraging the tool to conduct iterations all the way back through original synthesis on problem nets, but not bothering the engineer while doing that internally. (The engineer actually perceives a lesser number of design iterations.) OLA can even enable iterations back to the behavioral compiler level, by feeding back added constraints discovered about a failed RTL architecture.

The synthesis tool in hardware design is like the assembly language tool in software design-the synthesis tool removes the error introduced when a human has trouble remembering too many details. OLA improves upon it by enabling the timing analysis to be done correctly. This will continue even as the technology changes. And OLA enables the internal iteration callback to synthesis to fix structural deficiencies on problem nets.

Figure 2 - Meeting the challenge
This model describes the delay minima as more iterations take place. There is a huge difference between estimated performance at design and actual performance at layout. Iterations help to bridge the gap, but if you go too far, things get worse again. The challenge is to find that minima, which is something that a computer can do if it is allowed to automatically control the iterations.

Once comfortable with learning how to specify architecture and constraints and with thinking more at a higher design level, OLA can enable the feedback of newly discovered constraints at post placement and at post route. This enables higher level language tools for use in hardware design by using the feedback of post detailed route. "Can't do that, because..." all the way back to the EDA architectural-level tool, which provides a newly discovered design constraint. OLA allows the architectural-level tool to deal with why the last RTL Verilog specification couldn't be made to work, and suggests alternatives to the designer. This is a manual step that involves the designer, but is also a new, less painful step to help alter the defective RTL design that the synthesis tool is unable to restructure for appropriate timing. This happens now, only much slower and much later in the design schedule. OLA enables a more efficient solution to the problem.

We discovered with behavioral compilers is that we could insist on adhering to established definitions of the smallest design (for a complex number multiplier, 1 multiplier, 1 adder, 7 clocks) and the fastest design (2 multipliers, 2 adders, 5 clocks).

The architectural tool could beat both parameters, coincidentally. It would be nice to have that opportunity again, if the situation could be remedied.

You can't put this information in the SDF file; it has nothing to do with a .lib file. There is definitely a future here and OLA enables the missing constraint feedback path that is critical to this success.

An OLA example

Consider a D (master-slave, rising edge triggered) flip-flop. It's output, Q, goes to a canonical cloud of gates "Sarah" (for simplicity, one gate). The output of "Sarah" goes to two other canonical forms, "George" and "Harry" (for simplicity, one gate each). Harry feeds the D input of the next flip-flop and there is a set-up requirement. There's one common clock and, just for simplicity, no skew into the clock delivery circuit (see Figure 3).

Figure 3 - Simplifying things
The second part of the graphic is what an intelligent synthesis tool will do to the first part. It removes the problem of the capacitative fanout, which in turn slows the net.

The question is, from clock _/- at the first Dff to clock _/- at the second Dff, ignore the setup and hold time at the first Dff. Do we have time for propagation delay to Q first Dff, down the wire to Sarah, through Sarah, down the wire to Harry, (what did George do), through Harry, down the wire to the second Dff, (don't forget the setup time), to what timing margin (slack, hopefully positive) remains? This is how OLA controls that calculation and how it proceeds from where the necessary information is. We will assume this is the post-placement, post-route, sign-off.

Thus, any delay is a computation achieved through a sequence of calls and callbacks, which are evaluated in a stack type of data structure (see Figure 4). It should be emphasized that these callback mechanisms are computationally different from subroutine calls where the value of all parameters that are needed to execute the subroutine algorithm are passed at the time of execution of the call to the subroutine. If the application or the library feels that some of the information might be needed frequently, that particular calculated result information can be cached. This saves the need for re-computation when needed again.

There is a basic engineering trade-off involved in the design of the EDA tool. How important is "run quickly" versus arriving at the "right answer" (close enough). Beta testing has shown some interesting extreme results, from "OLA runs 8x slower than SDF path" (turned out to be user-induced cockpit problem) to "OLA runs 10x faster than SDF path" (turned out to be user induced cockpit problem, again). The fact of the matter is, delay must be calculated in some tools. If you draw the circle to include the calculation tool and the modified files that must cross that circle as design iterations are performed, and then compare that consideration to the OLA tool performance, you'll find that there isn't a great deal of difference in performance.

We should sort out the results by counting the number of times that OLA took 20 percent longer and fixed the problem using tool-internal design iterations by going back to re-synthesis with the problem nets-versus the other approach, which was faster but couldn't fix the problem without calling for help from the designer. In the end, the latter method is the considerably slower one.

Figure 4 - Problem statement
Application calls and returns with the Delay Power Calculation Module through the API.

A standard problem solver

IC companies still benefit from the opportunity to generate only one library format, which can then be used by all of the tools needed to support it. Ideally, the same library supports all tools in the design-flow levels. This includes synthesis (at architectural and register transfer levels), cell selection, placement and routing, and verification. There is a consistency between the timing (and power-dissipation data) reported and that used by the different EDA tools along the various stages of the design. This reduces the need for human design iterations and significantly aids the end user (the designer).

The library data and the computational routines in the OLA model are in compiled form. They provide protection of the intellectual property (IP) within large macros or cores; it's possible to design for the IP market, while maintaining proprietary rights to the design details. The delay computation and the library are integrated, eliminating confusion and errors because of delay-calculation algorithm differences between the different EDA tools using the library. The library is an executable model, allowing licensing agreements to be implemented, such as those supported by FLEXlm [from globetrotter.com] where access restrictions are enforced down to cells, cores, and even delay or signal integrity algorithms.

The OLA approach can answer many important design queries: "Have there been any static or dynamic hazards created, or set-up and hold violations inserted by the next stage of design tool? If so, can we take that back to synthesis to fix structural problems without bothering the designer for mundane details?" Solving problems like this would be a significant step in the right direction toward reducing DSM closure time.

OLA is a standard, it works, and it delivers.


Keith Peshak is senior technical specialist at Silicon Integration Initiative, Inc. (Austin, TX).

To voice an opinion on this or any other article in Integrated System Design, please e-mail your comments to sdean@cmp.com.


Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

Sponsor Links

All material on this site Copyright © 2000 CMP Media Inc. All rights reserved.